Picture for Richard Yuanzhe Pang

Richard Yuanzhe Pang

The Llama 4 Herd: Architecture, Training, Evaluation, and Deployment Notes

Add code
Jan 15, 2026
Viaarxiv icon

Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following

Add code
Nov 13, 2025
Viaarxiv icon

Transformers Struggle to Learn to Search

Add code
Dec 06, 2024
Figure 1 for Transformers Struggle to Learn to Search
Figure 2 for Transformers Struggle to Learn to Search
Figure 3 for Transformers Struggle to Learn to Search
Figure 4 for Transformers Struggle to Learn to Search
Viaarxiv icon

Self-Generated Critiques Boost Reward Modeling for Language Models

Add code
Nov 25, 2024
Figure 1 for Self-Generated Critiques Boost Reward Modeling for Language Models
Figure 2 for Self-Generated Critiques Boost Reward Modeling for Language Models
Figure 3 for Self-Generated Critiques Boost Reward Modeling for Language Models
Figure 4 for Self-Generated Critiques Boost Reward Modeling for Language Models
Viaarxiv icon

Self-Consistency Preference Optimization

Add code
Nov 06, 2024
Figure 1 for Self-Consistency Preference Optimization
Figure 2 for Self-Consistency Preference Optimization
Figure 3 for Self-Consistency Preference Optimization
Figure 4 for Self-Consistency Preference Optimization
Viaarxiv icon

Self-Taught Evaluators

Add code
Aug 05, 2024
Figure 1 for Self-Taught Evaluators
Figure 2 for Self-Taught Evaluators
Figure 3 for Self-Taught Evaluators
Figure 4 for Self-Taught Evaluators
Viaarxiv icon

An Introduction to Vision-Language Modeling

Add code
May 27, 2024
Figure 1 for An Introduction to Vision-Language Modeling
Figure 2 for An Introduction to Vision-Language Modeling
Figure 3 for An Introduction to Vision-Language Modeling
Viaarxiv icon

Iterative Reasoning Preference Optimization

Add code
Apr 30, 2024
Figure 1 for Iterative Reasoning Preference Optimization
Figure 2 for Iterative Reasoning Preference Optimization
Figure 3 for Iterative Reasoning Preference Optimization
Figure 4 for Iterative Reasoning Preference Optimization
Viaarxiv icon

Self-Rewarding Language Models

Add code
Jan 18, 2024
Figure 1 for Self-Rewarding Language Models
Figure 2 for Self-Rewarding Language Models
Figure 3 for Self-Rewarding Language Models
Figure 4 for Self-Rewarding Language Models
Viaarxiv icon

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

Add code
Nov 20, 2023
Figure 1 for GPQA: A Graduate-Level Google-Proof Q&A Benchmark
Figure 2 for GPQA: A Graduate-Level Google-Proof Q&A Benchmark
Figure 3 for GPQA: A Graduate-Level Google-Proof Q&A Benchmark
Figure 4 for GPQA: A Graduate-Level Google-Proof Q&A Benchmark
Viaarxiv icon